[1] 94.3 80.3 69.7 77.0 75.7 102.2 79.7 101.5 99.7 59.1
Germans Trias i Pujol Research Institute and Hospital (IGTP)
Badalona
March 19, 2025
1. Population and sample
2. Central Limit Theorem
3. Confidence Interval
4. Hypothesis test
5. Statistical vs clinical significance
Population: a set of elements that have one or more characteristics that can be observed in common - these are known as inclusion criteria.
Example: The set of adults (>17 years) with hypertension.
Finite or infinite.
Sample: a subset of elements of a population.
It is finite and of a reasonable size.
Representative or convenience.
Statistical inference is the set of methods that allow conclusions to be drawn about a population from a sample.
This sample must be representative and made up of randomly selected individuals to avoid bias.
We will use characteristics of the sample to infer the characteristics of the population.
Parameter: A numerical value that describes a characteristic of a population.
Population mean: \[ \mu = \frac{\sum_{i=1}^{N} x_i}{N} \]
It’s constant and usually unknown.
Statistic: A numerical value calculated from a sample that describes a characteristic of the sample.
Sample mean: \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
It’s variable and can be calculated.
Sample mean: \[ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} \]
Standard deviation: \[ S_x = \sqrt{\frac{\sum_{i=1}^{n} (x_i-\bar{x})^2}{n-1}} \]
Sample proportion: \[ p_i = \frac{x_i}{n} \]
How to go from sample statistic to population parameter?
Note
The key concept is the sampling distribution: the probability distribution of a statistic.
Weight measured on a random sample of 10 subjects:
- Sample #1
[1] 94.3 80.3 69.7 77.0 75.7 102.2 79.7 101.5 99.7 59.1
Mean: 83.9
- Sample #2
[1] 95.2 101.2 94.4 73.3 84.4 70.1 74.8 62.9 89.6 73.6
Mean: 82
- Sample #3
[1] 72.7 70.8 82.9 70.2 73.9 76.2 100.7 77.1 81.3 88.0
Mean: 79.4
- Sample #4
[1] 62.0 107.0 81.1 79.5 77.3 52.4 69.7 86.6 60.0 70.8
Mean: 74.6
| Characteristic | N = 41 |
|---|---|
| Weight (Kg) | 80.0 (4.0) |
| 1 Mean (SD) | |
… 500 samples:
… 1000 samples:
| Characteristic | N = 501 |
|---|---|
| Weight (Kg) | 79.7 (4.3) |
| 1 Mean (SD) | |
| Characteristic | N = 1001 |
|---|---|
| Weight (Kg) | 80.6 (5.0) |
| 1 Mean (SD) | |
| Characteristic | N = 5001 |
|---|---|
| Weight (Kg) | 79.8 (4.8) |
| 1 Mean (SD) | |
| Characteristic | N = 2,0001 |
|---|---|
| Weight (Kg) | 79.9 (4.8) |
| 1 Mean (SD) | |
For large enough \(n\), the sampling distribution of the \(\bar{x}\) tends to a normal distribution with mean \(\mu\) and standard deviation \(\frac{\sigma_{\bar{x}}}{\sqrt{n}}\).
\[ \bar{x} \sim Normal(\mu,\frac{\sigma_{\bar{x}}}{\sqrt{n}}) \] where \(n\) is the number of subjects in each sample.
\[ \scriptsize x \sim Normal(\mu=80,\sigma=15) \]
\[ \scriptsize \bar{x} \sim Normal(\mu,\frac{\sigma}{\sqrt{n}}) \]
\[ \scriptsize \sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}=\frac{15}{\sqrt{10}}=4.7 \]
\[ \scriptsize \bar{x} \sim Normal(\mu=80,\frac{\sigma}{\sqrt{n}}=4.7) \]
| Characteristic | N = 101 |
|---|---|
| Rare | 374.9 (4.9) |
| 1 Mean (SD) | |
| Characteristic | N = 301 |
|---|---|
| Rare | 374.9 (3.1) |
| 1 Mean (SD) | |
| Characteristic | N = 501 |
|---|---|
| Rare | 374.7 (3.9) |
| 1 Mean (SD) | |
What if we modify the sample size…
| Characteristic | N = 1,0001 |
|---|---|
| Rare | 374.9 (13.4) |
| 1 Mean (SD) | |
| Characteristic | N = 1,0001 |
|---|---|
| Rare | 375.1 (4.0) |
| 1 Mean (SD) | |
| Characteristic | N = 1,0001 |
|---|---|
| Rare | 375.0 (0.4) |
| 1 Mean (SD) | |
The standard error (SE) measures how much a sample statistic varies from sample to sample.
\[ \text{SE of the mean}= \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \] …in a sample
\[ \text{SE of the mean}= s_{\bar{x}} = \frac{s}{\sqrt{n}} \]
- Sample:
[1] 88.0 50.7 111.4 65.4 83.3 81.0 44.0 82.8 71.8 99.9
- Mean:
[1] 77.84014
- Standard Deviaton:
[1] 20.70304
- Standard Error:
[1] 6.546876
\(\bar{x} - \text{percentil}_{2.5}\) N(μ=77.8, σ=6.5)= 65
\(\bar{x} + \text{percentil}_{97.5}\) N(μ=77.8, σ=6.5)= 90.7
A range of values, derived from a sample, that is likely to contain the true population parameter with a specified confidence level (e.g., 95% or 99%).
\[ \text{Mean Confidence interval 95%} = \bar{x} \pm Z_{\alpha/2} \frac{s}{\sqrt{n}} \] Where \(Z \sim N(0,1)\)
Given 100 independent samples and their 95% confidence intervals of the mean, around 95 intervals would contain the true value.
95% refers only to how often confidence intervals computed from many studies would contain the true value.
Given 100 independent samples and their 95% confidence intervals of the mean, around 5 would NOT contain the true value.
95% refers only to how often confidence intervals computed from many studies would contain the true value.
To be honest… you should use the t-Student distribution.
\[ \text{Mean confidence interval} = \bar{x} \pm t \frac{s}{\sqrt{n}} \]
Where \(t \sim t_{1-\frac{\alpha}{2},(n-1) }\)
What % of Uganda’s roads are paved?
CIA knows: https://www.cia.gov/the-world-factbook/
Total: 20,544 km (excludes local roads)
Paved: 4,257 km (20.7%)
Is our estimate reliable or biased?
A statistical method to assess whether the evidence in a sample supports or contradicts a claim about a population parameter.
The null hypothesis \(H_{0}\) is the assumption tested in statistical analysis, stating that a population parameter meets a specific condition, such as no effect or no difference.
\[ H_{0}: \pi_{\text{paved}} = 20.7% \]
The alternative hypothesis \(H_{1}\) is a statement that contradicts the null hypothesis, proposing that there is an effect, a difference, or an association in the population.
\[ H_{1}: \pi_{\text{paved}} \ne 20.7% \]
A Type I error occurs when the null hypothesis is rejected despite being true, with its probability defined by the significance level (\(\alpha\)).
\(\alpha=0.05\) ; \(\alpha=0.01\)
This risk should be considered before conducting the test.
A Type II error occurs when the null hypothesis is not rejected despite being false, with its probability denoted as \(\beta\); its complement \((1 - \beta)\) is called statistical power.
\(\text{power}=(1-\beta)=0.8\) ; \(\text{power}=(1-\beta)=0.9\)
This risk should be considered before conducting the test.
Null hypothesis: 20.7%
Our data says: 5 events over 20 possibles events.
So, the % of paved roads is 25% 95%CI [11.2% to 46.9%]
Is this data consistent with our hypothesis?
\[ H_{0}: \pi_{\text{paved}} = 20.7% \] \[ H_{1}: \pi_{\text{paved}} \ne 20.7% \]
Is the difference \(\pi_{\text{paved}} - 20.7=0\) ?
Under the null hypothesis the difference is 0 with a SE \(\frac{\sigma_{p_{\text{paved}}}}{\sqrt{n}}\)
Thus, the difference distribution \(\sim N(\mu=0,\sigma=\frac{\sigma_{p_{\text{paved}}}}{\sqrt{n}})\)
Proportion difference distribution under the null hypothesis.
Proportion difference distribution under the null hypothesis.
How probable is to observe a difference of \(p_{\text{paved}} - 20.7 = 4.3\) under the null hypothesis?
Proportion difference distribution under the null hypothesis.
Given the predefined Type I error rate, we fail to reject the null hypothesis because the probability of observing a difference of 4.3% or greater under the null hypothesis is 0.635 > 0.05.
Null hypothesis: 20.7%
Our data says: 10 events over 20 possibles events.
So % of paved roads is 50% with a 95%CI [29.9% to 70.1%]
Is this data consistent with our hypothesis?
Proportion difference distribution under the null hypothesis.
Given the predefined Type I error rate, we reject the null hypothesis because the probability of observing a difference of 29.3% or greater under the null hypothesis is 0.0012 \(\le\) 0.05.
Null hypothesis: 20.7%
Our data says: 100 events over 400 possibles events.
So % of paved roads is 25% with a 95%CI [21% to 29.5%]
Is this data consistent with our hypothesis?
Proportion difference distribution under the null hypothesis.
Given the predefined Type I error rate, we reject the null hypothesis because the probability of observing a difference of 4.3% or greater under the null hypothesis is 0.0338 \(\le\) 0.05.
The P value is the probability that the test hypothesis is true.
The P value is the probability that chance alone produced the observed association.
The P value is the probability of obtaining a result as extreme as the observed one (or more extreme) under the assumption that the null hypothesis is true.
In clinical research, the use of hypothesis testing and p-values as a criterion of relevance has become widespread to the point of overuse.
Statistical significance, as defined above, indicates whether an observed result is unlikely under the null hypothesis, usually using cut-off points such as 0.05 or 0.01.
However, a statistically significant result does not indicate the size of the effect or its clinical relevance.
The clinical significance of a finding is determined by assessing whether the effect is large enough to influence medical practice or decision making.
P-values do not measure the probability that the hypothesis being tested is true, or the probability that the data were generated by chance alone.
Scientific conclusions and business or policy decisions should not be based solely on the fact that the p-value exceeds a certain threshold.
A p-value, or statistical significance, does not measure the size of an effect or the importance of an outcome.
Ronald L. Wasserstein & Nicole A. Lazar (2016): The ASA’s statement on p-values: context, process, and purpose. The American Statistician Association.
Inference is a SUPER POWER.
Report confidence intervals.
Handle p-values with care.
Mind the assumptions.
Report the effect size.
Applied Biostatistics Course with R